In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++

نویسندگان

  • Jan Hajic
  • Pavel Pecina
چکیده

Optical Music Recognition (OMR) has long been without an adequate dataset and ground truth for evaluating OMR systems, which has been a major problem for establishing a state of the art in the field. Furthermore, machine learning methods require training data. We analyze how the OMR processing pipeline can be expressed in terms of gradually more complex ground truth, and based on this analysis we present the MUSCIMA++ dataset of handwritten music notation that addresses musical symbol recognition and notation reconstruction. The MUSCIMA++ dataset v.0.9 consists of 140 pages of handwritten music, with 91255 manually annotated notation symbols and 82261 explicitly marked relationships between symbol pairs. The dataset allows training and evaluating models for symbol classification, symbol localization, and notation graph assembly, both in isolation and jointly. Open-source tools are provided for manipulating the dataset, visualizing the data and annotating further, and the dataset itself is made available under an open license.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwritten Music Object Detection: Open Issues and Baseline Results

Optical Music Recognition (OMR) is the challenge of understanding the content of musical scores. Accurate detection of individual music objects is a critical step in processing musical documents, because a failure at this stage corrupts any further processing. So far, all proposed methods were either limited to typeset music scores or were built to detect only a subset of the available classes ...

متن کامل

CVC-MUSCIMA: A Database of Handwritten Music Score Images for Writer Identification and Staff Removal

The analysis of music scores has been an active research field in the last decades. However, there are no public available databases of handwritten music scores for the research community. In this paper we present the CVC-MUSCIMA database of handwritten music score images. It consists of 1,000 music sheets written by 50 different musicians. The dataset has been especially designed for writer id...

متن کامل

Detecting Noteheads in Handwritten Scores with ConvNets and Bounding Box Regression

Noteheads are the interface between the written score and music. Each notehead on the page signifies one note to be played, and detecting noteheads is thus an unavoidable step for Optical Music Recognition. Noteheads are clearly distinct objects; however, the variety of music notation handwriting makes noteheads harder to identify, and while handwritten music notation symbol classification is a...

متن کامل

Towards the Alignment of Handwritten Music Scores

It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such dif...

متن کامل

Handwritten Optical Music Recognition

Optical Music Recognition (OMR) is a field of document analysis that aims to automatically read music scores. Music notation encodes music in a graphical form; OMR backtracks through this process to extract the musical information from this graphical representation. Common western music notation (CWMN) is an intricate system for visually representing music that has shown considerable resilience...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.04824  شماره 

صفحات  -

تاریخ انتشار 2017